import numpy
import pandas
from plotly import express
from plotly import io
io.templates.default = 'ggplot2'
age = numpy.round(
numpy.random.normal(
loc=50,
scale=10,
size=100
),
decimals=0
)
dbp = numpy.round(
(0.8 * age) + numpy.random.normal(
loc=50,
scale=10,
size=100
),
decimals=0
)
df = pandas.DataFrame(
{'Age':age,
'Diastolic BP':dbp,
'Treatment group':numpy.where(dbp>90, 'Placebo', 'Treatment')}
)
express.scatter(
df,
x='Age',
y='Diastolic BP',
color='Treatment group',
hover_name='Treatment group',
title='Diastolic blood pressure as a function of age'
).update_traces(
marker=dict(
size=12,
opacity=0.8,
line=dict(
width=2,
color='DarkSlateGrey'
)
),
selector=dict(
mode='markers'
)
)Information about the course
Information for this course is provided in Table 1.
| Title | Information |
|---|---|
| Instructor | Dr Jay H Klopper |
| Contact Dr Klopper | juanklopper@gwu.edu |
| Teaching assistent | Annie Allred |
| Contact Alyssa | annie.allred@gwmail.gwu.edu |
| Assignments | Before each lecture |
| Final examination | No final examination |
| Place and time | SPH Lecture room 600A, each Friday at 11:00 AM |
The course content contains instructions, called Tasks, that are completed in class or at home. Some only require reading. Other tasks require answering questions or writing code.
Initial in class tasks
- Python (and coding environment) set up
- GitHub set up
- Cloning a GitHub repository
- Work through the content of this documnet and the next
A computer program
A computer program is a sequence of written (typed) instructions that allows a computer to perform specified tasks.
A computer program is written in one (or more) computer languages. Such languages can be classified in many ways. One such classification compares compiled languages to interpreted languages. In most languages, the instructions, called the source code, is written in human reaable form. This source code is then either compiled (translated) into computer instruction specific to a computer type or interpreted.
In the case of interpreted languages, another program called an interpreter, interprets the language code for execution. Python and R are examples of interpreted languages and Julia is an example of a compiled language.
Python
Python is a general-purpose programming language developed by Dutch computer scientist Guido van Rossum. He began work on Python in the late 1980’s and published the first version in 1991.
While Python can be used in many programming applications such as game development, app development, web development, and much more, it is arguable best known as the principal language for data analysis, data science, statistics, biostatistics, and machine learning.
Python constantly ranks amongst the most used programming languages. The TIOBE Programming Community index is an indicator of the popularity of programming languages. The index is updated once a month.
Python is an interpreted language as defined above. Lines of code are individually translated to code that a computer can understand and then executed. Since speed is a major benefit of compiled languages, it bares mentioning that Python is slow to execute. Why then, is Python so popular?
There is no doubt that the popularity of Python is rooted in its ease of use. It is remarkable easy to learn Python. The simple nature of the syntax means that it reads almost like English sentences. It is very easy to translate an idea into computer code using Python. The translation from idea to code is referred to as computational thinking. Computational thinking is a core skill in research today and applies to the use of any computer language.
Coding environments
Computer code is written in a coding environment. Coding environments range from the bare-bones REPL (read-evaluate-print-loop) terminal or command line interfaces, to rich development environments such as PyCharm, Visual Studio Code, and many more.
These notes are created using a modern approach to computing, called a notebook. Many language include the ability to create notebooks. The Wolfram Language was the first langauge to make use of this and the language’s creator, Stephen Wolfram, calls these computational essays. This document (and all the documents that we will use in this class) is a Jupyter notebook.
Jupyter notebooks started life as IPython notebooks. IPython is a particular flavor of Python and stands for interactive Python. In interactive Python, we write a few lines of code called scripts and execute then one at a time. It is ideal for exploration, especially for data exploration.
Ipython notebook were created for the Python language. Today we use this notebook environment for other languages such as Julia and R, hence the name change to Jupyter notebooks.
Jupyter notebooks are being replaced by Jupyter Lab. Jupyter Lab is a more comprehensive environment that includes notebooks as a part of other useful coding tools.
Jupyter notebooks or Jupyter Lab runs in a web browser. Many coding environments such as PyCharm, DataSpell, Visual Studio Code, and more can also generate Jupyter notebooks.
As the term notebook suggest, these computational essays allow us to write normal sentences and paragraphs, with styling such as headings and subheadings, the inclusion of images, sounds, videos (such as YouTube), files, and more.
We can also include code in a notebook. The code is written as short scripts and are executed immediately, with the result displayed underneath the code.
Lines of code are written in Code cells. Text, images, and such are entered in Markdown cells. A Jupyter notebook consists of many cells. Either a Code cell or a Markdown cell can be created after or between any cells.
Below we see a Code cell, with the result of the code displayed following the cell.
The newest coding environment is Quarto. Quarto was developed by the group Posit (previously RStudio), the developers of the RStudio development envorinoment and many of the most used statistics and data science libraries in the R language.
Quarto documents can be created in RStudio, but also in Jupyter notebooks, in Visual Studio code, and in many other environments. Many computer languages can be used in a Quarto document. This notebook was created in Quarto. Quarto documents can create webpages, portable document format (PDF) files, blogs, presentations, and even textbooks.
One of the main ideas behind Quarto is to power the next generation of research documents, both for research use, but more importantly, for publication, in an open manner, allowing for verification of results and reproducibility.
The Python environment
Python is a computer language with constant updates, currently at version 3.11. It is an open-source language. Open-source means that anyone can view and contribute to the code that makes up the language.
Anyone is also free to add new functionality to the code. The added functionality is usually concentrated in a package. In the code cell above, we imported some packages and specific parts of packages, such as numpy and pandas.
Packages (or libraries) extend the language by introducing more code to the language. There is a tremendously rich ecosystem of packages in Python for working with data of all kinds.
Together with the rich ecosystem, we also find a vibrant community of user of Python. Answers to questions we may have, are readily available. These are created by Python experts, enthusiasts, academics, any more. Being open-source, there are countless tutorials and examples on the internet.
Many of today’s great discoveries and analysis are performed using Python and other open-source languages. The first image of a black hole was generated by analysing data using Python.
The open-source nature of Python is in contrast to commercial software that are increasingly falling behind Python, R, and other free and open languages. Commercial software can be expensive and poorly supported. It can be difficult (and expensive) to find educational resources, with communities being much mnore closed in general.
The final benefit of using open-source computing tools is that it provides for open and reproducible research. Being able to show how analysis was conducted is becoming very important in our modern era of research.
Python 101
Code comments
In most cases, we write code to review it later or to share it with others. In both cases, it makes sense to leave comments about the code. This will remind you what your code is doing when you review it later and help others understand what you tried to achieve.
When we write lines of code, we can prepend any line (or part of a line) uing the pound or hashtag symbol, #. The Python interpreter will ignore everyting in a line following the # symbol. It is therefor ideal to use when leaving comments on our code.
In most of the code cells below, you wil notice the use of code comments.
Arithmetic
We are all familiar with the basic arithmetical operators such as addition and multiplication. The simple use of mathematical operations in Python are examples of expressions. The actual symbols such as + and - are termed operators. Table 2 below shows a list of these operators.
| Expression | Operator | Example | Result |
|---|---|---|---|
| Adding | + |
2 + 2 |
4 |
| Subtracting | - |
8 - 3 |
5 |
| Multiplying | * |
2 * 2 |
4 |
| Dividing | / |
8 / 4 |
2 |
| Integer devision | // |
10 // 3 |
3 |
| The remainder | % |
10 % 3 |
1 |
| Exponentiation | ** |
2 ** 4 |
8 |
Addition
We start by adding 2 + 2, which should result in 4. To excute a cell, we can hold down the SHIFT key on our keyboard and then hit the ENTER key on a Windows or Linux machine or RETURN on a Mac. There is also a button above each cell in many coding environments that we could click.
Note the use of spaces between the 2 and the + symbol. This is simply for ease of reading. We could also omit the spaces and write 2+2. Note also the use of a code comment. A code comment is started by a pound symbol or hashtag. Python ignores any script following a pound symbol (for a specific line of code).
More than two numbers can be added in a single expression. Below we add 2 and 2 and 10.
Subtraction
One number is subtracted from another using the - operator.
More than one number can be used in a subtraction expression.
Multiplication
Since keyboards do not have a multiplication key, we use the * symbol for this operation. It is ususally a part of the 8 key.
Below, we calculate 10 \times 8 \times 2.
Division
As with multiplication, we make use of another key for this operation. It is the forward slash key, /.
Below, we divide 10 by 2.
Note that we used integers, but the results is expressed in decimal format. Now, we divide 02 by 8.
The result of dividing 20 by 8 results in a value with a decimal point. We can use the // operator to return only the whole number (integer) part of the solution.
Since 2 \times 8 = 16, we have a remainder of 4 (to get to 20). We can express the remainder using the % operator.
Powers
Consider the exponentiation in Equation 1.
2^{3} = 2 \times 2 \times 2 = 8 \tag{1}
We use the double asterisk symbol, **, to calculate powers.
The order of arithmetical operations
Remember that there is an order to mathematical operations, i.e. division and multiplication comes before subtraction and addition. In the expression 3 + 4 \times 2, the 4 and 2 and multiplied first resulting in 8. The 2 is then added to yield 11.
Parentheses are used to force the order of operations. Below, we add 3 + 4 first and then multiply the results by 2.
The solution can be opened below. Try an complete the exercise before looking at the solution.
Comparison operators
Comparison operators or conditionals are used to return the value of a comparison. The return is one of two Boolean values: True or False based on the comparison being made.
The two Boolean values are True and False. The data type of these are printed below using the type function. See below under Python data types for more information on the type function and data types in Python.
We see the Python data type of bool. Internally, True is stored as the integer 1 and False as the integer 0. Therefor, we can do arithmetic with the values.
The two Boolean values are returned when we use contidionals.
The operators are listed in Table 3.
| Comparison | Operator | True example | False Example |
|---|---|---|---|
| Less than | < | 2 < 4 | 4 < 2 |
| Greater than | > | 4 > 2 | 2 > 4 |
| Less than or equal to | <= | 4 <= 4 | 2 <= 4 |
| Greater than or equal to | >= | 4 >= 4 | 2 >= 4 |
| Is equal to | == | 4 == 4.0 | 4 == 2 |
| Is not equal to | != | 4 != 2 | 4 != 4 |
Below, we run through the examples in Table 3. Follow along by viewing the code comments.
Functions
A lot of what we do in Python requires a function. A function is a keyword in the language that takes some input (always provided inside of a set of parenthesis that directly follows the function keyword) and gives an output. An input is called an argument (or sometimes a parameter). Below, we pass the argument 'This is easy,' to the function print(). The output is a screen printout of the input.
As we continue, we will learn more and more functions. The functions or keywords in Python have rules of use. This is much like a spoken language. In essence, we are learning a new language. Do not be concerned, though. It is much simpler than learning a new spoken language.
Later in the course, we will also learn how to create our own functions.
Python data types
Much of what we work with in Python are of a certain computer data type. This type sets the rules for what we can do.
One helpful function is the type() function. It tells us the Python data type that we are working with. Think of the number 3. In mathematics, it is an integer.
Once we add a decimal place, we change the computer data type. For instance 3.0 is a decimal value. These are termed floating point values in most computer languages.
Characters and text are termed strings. They are placed in single (or double) quotation marks.
When numbers are placed in quotation marks, they become strings and we can no longer use them for calculations.
Functions such as print also have a type.
The reason that a function has a type is that (almost) everything in Python is an object. To function properly, each object must have a type.
Casting converts an object from one type to another. Some casting is allowed in Python.
In the code cell below, we cast the string 8 to the interger 8,
Where we could not do arithemtical addition between the string '8' and the integer 8, we can do so after casting the string to an integer.
Variables
Computer variables, or variables for short, are the containers for objects. Objects can be assigned to a variable. This action creates a space in computer memory in which the information about the object, including its type and value, are stored. At least while the Python program or file is active.
We assign a human-created name to a variable. These should be descriptive of what is contained in the variable, such that when we view a file later, or share it with others, everyone can deduce what is contained in the variable.
An object is assigned to a variable by using the assignment operator, which is the symbol =.
In the code cell below, we create the variable my_variable and assign a string object to it. The string object contains the value I like Python.
We can now call the object contained in the variable by using the variable name.
Note that in a scripting environment such a notebooks, we do not have to use the print function.
The assignment operator is not used in the same way as we use it in mathematics. The assignment operator simply assigns what is to its right, to what is to its left.
Below, we create the variable i and assign the value 1 to it. We then call the variable.
Now, we add 1 to the object contained in the variable i. It is clear that the assignment operator is not used as an equal symbol.
On the right-hand side of the assignment operator we have i + 1. When the code is excuted, i contains the value 1. We then add 1 to it, which is 2. This is then assigned to the variable that is on the left-hand side of the assignment operator. On the left, we have a variable that already contains an object. Python allows the content of a variable to be overwritten. The variable i therefor then contains the integer object with a value of 2.
Code flow control
We can control the flow of execution of code based on conditions, using comparison operators.
We start by considering loops. Loops allows us to run over some code many times, depending on some criteria. Here, we investigate for and while loops.
The range function is convenient for generating sequences. With a single-value integer argument it generates a list starting at 0 and increments in a step size of 1 until the specified value minus 1, is reached (the specified value is not included). The function returns a range object that can be used for iteration (over its elements).
We can loop over the integer values in a range. Below, we see an example of a for loop. We generate a loop counter, i which will loop over the values specified in the range function. Each of the values in the range is printed during the iteration (loop).
Once all the loops are completed, the code execution halts.
We can also loop over code while a condition is met. We start with a counter and then use the while loop which will continue until a condition is met, or more precisely, the loop terminates when the condition returns a False value.
0
1
2
3
4
Note in the code cell above, we used the syntax (compute code) short-cut i += 1. This is the same as i = i + 1.
When the condition returns a False value, the the while loop is exited.
The if, elif, and else statements allow us to branch the execution of code. Depending on the set conditions, different code can be executed.
In the code cell below, we create a variable named variable and assign the integer value 10 to it.
As an example of using an if statement, we want to consider if the integer value in variable is larger than 5. If it is, we want to print the statement The value is larger than 5 to the screen. If the integer value in variable is less than or equal to 5, we want to print The value is not larger than 5 to the screen.
The value is larger than 5.
We use the keyword if and then state a condition, followed by a colon symbol, :. The new line is indented and contains the code for execution if the condition in the if statement holds (returns as True). A new line uses the else keyword (in vertical allignment with the if keyword). The else keyword is also followed by a colon and then a new indented line containing a final statement to be executed if the prior condition was not met.
We can also add an elif (short for else if) statement. As an example, we might want to print The value is exactly 5 if such a condition holds. The elif statememt (in vertical alignment with the if statement) is followed by a new conditional, a colon, and then a new indented line. Here, we add the code to be executed if the elif conditional holds.
Many elif statements can be added. We have a final else statement as before. The statement following the else keyword is executed of all the prior conditionals are not met.
The value is larger than 5
Note the order of the choice of the conditionals. It is a good idea think through the problem before creating an if statement.
We will take a closer look at strings in the next notebook. Below, we see the in keyword. It acts as a conditional and can verify if a string is contained in another string. Below, we see that the word 'Python' is contained in the string 'Python is a computer language.'
We see that the string 'python' is not in the strings 'Python is a computer language.' The uppercase P and lowercase p are not the same character.
Lab assignment
[30 points]